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Abstract The development of statistical methods and 
numerical algorithms for model choice is vital to many 
real-world applications. In practice, the ABC approach 
can be instrumental for sequential model design; how- 
ever, the theoretical basis of its use has been ques- 
tioned. We present a measure-theoretic framework for 
using the ABC error towards model choice and describe 
how easily existing rejection, Metropolis-Hastings and 
sequential importance sampling ABC algorithms are 
extended for the purpose of model checking. We con- 
sidering a panel of applications from evolutionary biol- 
ogy to dynamic systems, and discuss the choice of sum- 
maries, which differs from standard ABC approaches. 
The methods and algorithms presented here may pro- 
vide the workhorse machinery for an exploratory ap- 
proach to ABC model choice, particularly as the appli- 
cation of standard Bayesian tools can prove impossible. 
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1 Introduction 

Approximate Bayesian Computation (ABC) methods 
have appeared in the past ten years as a way to handle 
intractable likelihoods and posterior densities 



tt(9\x ) cx f(x o \6M0) 



(1) 



that arise under high dimensional, data-generating mod- 
els. For example, complex coalescent models are known 
to generate, currently, latent structures that are too 
high dimensional to bring a reliable numerical approx- 
imation in practical computer time. Originally devel- 
opped for population genetics, ABC has since been ap- 
plied to many applied problems where Bayesian analy- 
sis has long been contemplated but previously remained 



elusive (Beaumont 2010 Marin et al 2011). 



As with other approximation methods like varia- 



tional Bayes ( Jaakkola and Jordan 



2000 



MacKay 2002) 



or indirect inference ( Heggland and Frigessi I I2004D , ABC 



suffers from a limited ability to quantify the uncertainty 
in the approximation of the posterior (JlJ. Moreover, 
the loss of information brought by the ABC approxi- 
mation implies that the application of parts of standard 
Bayesian machinery, such as the Bayes factor, is fraught 



with difficulties (Robert et al 2011) 



While much of Bayesian model checking is based 
on evaluating model predictions, ABC uses such model 
predictions for parameter inference. In this perspective, 
it is natural to attempt using the pseudo-data x that 
is generated by ABC Monte Carlo algorithms both for 
parameter inference and model assessment. There is no 
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issue of bias in doing so because we are considering a 
simulation technique rather than an inferential method: 
the data itself is only "used once". Some of us have 
called for some technical refinement of this framework, 



called ABC under model uncertainty (ABC/j,) (Robert 



et al 2009), particularly as this technique can gener- 



ate further insight in practice (Drovandi et al 2011 



Ratmann et al 20101. In opposition to more formal 



Bayesian model choice approaches (Toni et al 2008 



Grelaud et al 2009), a key to the validation of ABC as 



a model assessment in Ratmann et al (2009) relies on 



the very fact that likelihood computations and compar- 
isons under a model can be replaced by assessing the 
amount of fit between simulations x from that model 
and the observed data xq. 

In this paper, we first present technical modifica- 
tions that reflect our consensus view on ABC under 
model uncertainty. We then describe basic, yet efficient 
Metropolis-Hastings and sequential importance sampling 
ABC algorithms for approximate parameter inference 
and model checking, which forms the main contribution 
of this paper. On purpose, these algorithms are closely 
related to existing, popular ABC algorithms to show 
how easily these methods can be extended to incorpo- 
rate model checking at no or little additional computa- 
tional cost. These algorithms are presented in Section[3j 
following a description of their theoretical foundations 
in Section[2] We illustrate these algorithms on a panel of 
applications, ranging from population genetics to net- 
work evolution and dynamic systems (Sections 4-6). We 
discuss the relative advantages of both algorithms, and 
compare these to a hybrid algorithm that seeks to com- 
bine the strengths of either method. Given the difficul- 
ties associated with using approximate Bayes factors 
for model choice, we conclude that the algorithms pre- 
sented in this paper may provide the workhorse machin- 
ery for a viable, exploratory approach to model choice 
when the likelihood is computationally intractable (Sec- 
tion [F}. 



2 A measure-theoretic framework for ABC 

To formalize the setting of ABC-led inference with an 
application to diagnostic model assessment in mind, we 
first present a measure-theoretic framework that up- 



dates the previous formulation in Ratmann et al (2009) 



The extended ABC algorithms in Section [3] also handle 
goodness-of-fit type analyses and follow immediately 
from this re-interpretation of ABC. 



Algorithm rejABC 

on x X to sample from Eq.[2] 

rejABCl Sample 8 ~ tt($), simulate x ~ /(• \9) and 

compute e = p(S(x), S(xo)J . 
rej ABC2 Accept (9, x) with probability proportional to 

k(e; t), and go to rejABCl. 



Algorithm rejABC/x 

on x M K to sample from Eq.[6] 

rejABC/xl Sample 6 ~ simulate x ~ /(• \6) and 

compute e fe = p fe (S k (x),S k (x )), k = 1, . . . , K. 

rejABC/x2 Accept {6,s 1 . K ) with probability propor- 
tional to TT fe t(e*:i r fe)i an d g° to rejABCpl. 



Table 1 Rejection samplers for ABC and ABC/i. 



2.1 The ABC approach 

As in most ABC settings, we suppose that pseudo- 
data x € X can be efficiently simulated for any vec- 
tor of model parameters 9 £ O from a data-generating 
process / that defines, perhaps implicitly, the likeli- 
hood. We also consider a set of K summaries § = 
{Si, . . . , Sk }, a real- valued distance function p, and 
the one-dimensional ABC kernel 

K(e;r) = 1/rl {|e| < r/2} 

with tolerance r > 0. To circumvent likelihood evalua- 
Pritchard et al (1999) first proposed the rejection 



tions, 

sampler rejABC in Table [l]A 

The target density of rejABC on the augmented 
space x X is therefore 

n T {9,x\x ) cx K^p(S(x),S(x ));r^f(x\e)7r(e). (2) 

In typical applications, the auxiliary variable x is ex- 
tremely high-dimensional, and lower-dimensional sum- 
maries are used to compare the simulated data with the 
observed data. The realized error e — p(S(x),S(xo)) is 
then accepted by the ABC algorithm when within a 
prescribed tolerance r. ABC is a valid, non-parametric 
estimation method in that, as t — > 0, the marginal den- 
sity tt t (9\xq) of ([2]) approaches the true posterior distri- 
bution |l]) if the summaries are sufficient for 9 under the 
model. Otherwise, Q converges to the posterior distri 
bution tt(9\E>(xq)) when r — > 0. Fearnhead and Prangle 



( |2010[ ) and pean et~aT| ( |2011| ) show that an ABC-based 
inference is converging (in the number of observations) 
if the parameter 9 is identifiable in the distribution of 
S(x). 
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2.2 ABC on error space 

The error e computed in algorithm rej ABC (Table [T]^) 
is, in fact, a compound error term that may reflect both 
stochastic fluctuations in simulating from / as well as 
systematic biases between / and the data. To exploit 
this information, we reformulate ABC as providing sim- 
ulations on the joint space of model parameters (9) and 
summary errors {s^x)- Algorithm rejABCu in Table[Tj3 
uses the projection 



x — > e 



1:K 



(ei, . . . ,£fr), 



£fc = Pk(Sk(x), Sk(xo)), which induces the image mea- 
sure (abusively denoted by) 



L ,e(Ei x ... x E K ) 

f(dx\0) 



(4) 



I&o(EiX...xE k ) 

on the associated Borel image a-algebra, conditional on 
Xq, 9. The density of Q with respect to a suitable mea- 
sure on the if-dimensional error space will be denoted 
(again abusively) by 

Ii K , m- 



£x ,6 



■V.K 



(5) 



This multi-dimensional error density is thus the image 
of the sampling density f(-\0) by the transform £ XOt $. It 
can be interpreted as the prior predictive error density 
conditional on 9. 

Example 1 In many applications, pseudo-data x is sim- 
ulated on a finite space X. Then, f(dx\9) is a counting 
measure, say 

AT, 

f(dx\9) = ^Tf l S Xt (dx). 

i=l 

Hence, the image measure t; XOt g(de) is again a counting 
measure, say 

£,x ,e(ds) = ^2^ j 5 ej (de), 

3=1 

where N E is the size of ^ Xo m(X), < N e < N x < oo. 

From a computational perspective, ^ is in prac- 
tice intractable. In direct analogy to ABC, we circum- 
vent numerical evaluations of ^ through simulating 
from this density as illustrated in algorithm rejABCu 
in Table[Tj3. The target density of rejABC/i on the aug- 
mented space x M. K is 



(6) 



By construction, the marginal target densities tt t (9\xq) 
of rejABC and rejABC/i coincide if k(/j(S(x), S(xo); t) 
in rejABC can be written, up to a constant of propor- 
tionality, as Yl k n(p k (S k (x), S k (x );T k ) (to be used in 
rejABC//) for some choice of r k ; see the Appendix. For 
example, if ABC is run with the standard indicator ker- 
nel, this is equivalent to using the Manhattan distance 



{pk(S k (x),S k (x ))} 



and r k = t. The utility of this reformulation was first 



(3) discussed in Ratmann et al| ( 2009 ) : it is possible to re- 



late the marginal density 7r T (£ 1:Jf |a;o) of (|6| to standard 
Bayesian error measures. The marginal ABC error den- 
sity can be understood as the prior predictive error den- 



sity (Box 19801 that is re- weighted by error magnitude 



(7) 



3 Extending existing ABC algorithms 

Existing ABC algorithms are easily extended to sample 
from n T (9,e 1 . K \xo) for the purpose of parameter infer- 
ence and model assessment. 

In Ta ble[2) we contrast the Metropolis-Hastings ABC 
sampler (Marjoram et al 2003) to its extension that 
samples from ([6|). To demonstrate the validity of algo- 
rithm mhABC/i, let z = (9,s 1 . K ) and note that, on the 
augmented space, the proposal density of mhABC/i is 
q(z — !• z') = £,x .e'(£'i-K) ( l(^ 6')- Therefore, detailed 



balance is satisfied precisely for ir T 



'1:K 



x ): 



mh(z,z f ) = g(0'->0) <0'm k K{e' k \Tk) 
mh(z',z) q(9^9>)ir(9)i\ k K(e k ;T k ) 

q(z ->• z') tt(0) n fc t(e k ;T k ) £ X0 ,e(£ 1:K ) 

_ q{z' z) n T (z'\x ) 
q{z z') 7r T (z|x ) ' 



Bortot et al ( 2007 1 proposed a Metropolis-Hastings sam- 
pler on the space x X x [0, oo) for the purpose of 
parameter inference when the tolerance r is by design 
a random variable. For clarity, we note that this algo- 
rithm requires an extra proposal density q(r — ¥ r'), and 
has a target density different to both ^ and ([6| . 

In Table [3j we contrast the popular sequential im- 
portance sampler (SIS) for ABC ( |Toni et al| 12008] ) to its 
extension that samples from To keep the particle 
system at the nth stage alive, these algorithms augment 
the state space with an additional random variable P n € 
{1, . . . , N} . This is the ancestor index of the particle at 
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Algorithm sisABC 

on N x x X to sample, marginally, from Eq.[2] 

Set the initial particle system at n = 1: for 

% = 1,...,N compute Q\ ~ n(B), x\ ~ /(-^i), 

ef = p(§Oi),S(>o)), and then, for i = 1,...,N, 
Wl=K(e{;Tx)/ Ef=i ^^n). 

For n = 2, . . . ,n*, do: 

Set i = 1, c n = re(0; r„) and repeat: 

sisABCl Propose the ith ancestor index I' from 
i = 1,...,N with probabilities W£_ l5 0' ~ 

M„(6>^_ i; -), a:' ~ /H^') and compute e' = 

p(S(*').S(*o)). 

sisABC2 With probability re(e'; r n )/c„, set 

{I 1 n ,B\ l ,x' l n ) <— (I' ,8' ,x'), compute the unnor- 
malized weight 



= *(fi{,)/E^-i^»(Cii 



and increment i <— i + 1. If 
Else return to sisABCl. 
sisABC3 Compute the normalized weights 



N, go to sisABC3. 



'£-< 



and update n <— n + 1. If n < n* , go to sisABCl. 



Algorithm sisABC/i 

on N x x K K to sample, marginally, from Eq.[(i] 

Set the initial particle system at n = 1: for 
i = l,...,iV compute flj ~ tt(0), ~ /(-^i), 
e ife = Pk(S k (x{), S k (x )), and then, for i = 1,...,N, 
Wl = U k <e\ k ;r lk ) / U k <e{ k ;r lk ). 



For n 
Set 



1, c„ 



, do: 

n fe «(0; T„fc) and repeat: 



sisABC/xl Propose the ith ancestor index /' from 
i = 1,...,N with probabilities W^_ x , 9' ~ 

Mn(Ci;')> x ' ~ /H^') and compute e' fe = 
Pkls k (x'),S k (xo)) for all fc. 
sisABC/^2 With probability FJ, «(el; r nk )/c n , set 



«1 



n> »m °n.,l:K 
malized weight 



) <— (/', 0', e^, „), compute the unnor- 



T«)/E W »-1 M " 
3 = 1 



JV, go to sisABC/i3. 



and increment i <— i + 1. If 
Else return to sisABC/il. 
sisABC/^3 Compute the normalized weights 



'£< 



and update n <— n + 1. If n < n* , go to sisABC/^1. 



Table 3 Vanilla sequential importance samplers for ABC and ABC/j. Both algorithms require to specify a decreasing sequence 
of tolerances r„ for n = 1, . . . ,n*. The first tolerance ri is here set large enough such that Wl > for all i = 1, . . . , AT, and 
subsequent ones can be set automatically | |Del Moral et alj |2008| ) or according to an annealing scheme. Using the indicator 
kernel, the acceptance probability in sisABC2 (and sisABC/i2) is either zero or one (|Toni et al||2 008]). Typically, t he variance 
of the proposal kernel M n is modified at each stage to improve the convergence of the algorithm < |Beaumont et al||2010| ). 



stage n — 1 from which the ith particle at stage n is de- 
rived. Although we are only interested in the marginal 
target density of sisABC/i on the space x M. K , the val- 
idation of sisABC/x as a proper self-normalised impor- 
tance sampler with respect to n T (0,e 1 . K \xo) proceeds 

We note that at stage 



as in 

n > 



Beaumont et al (2010) 



1, proposed particles follow the law q n (z n ,I n ) = 
£x ,e n (£n,i:K)M n (6 n ,6 I n "_ 1 )Wn! 1 . However, after inte- 
grating out the index /„ and accounting for the weight 
correction, we have marginally for any 7r r -integrable 



function h that 
E^(W n h(z n )) 



N 



k) x 



3=1 



)v{z n -i)dz n -xdz n 
h(z n ) 7r(6>„) i\ k K.{e n ,k; T n . k ) ( xo ,e n (Su.i-.k) x 
v(z n _x)dz n -idz n 

ocE^(/i(z„)), 

independently of the distribution of the previous z n — \. 
The algorithm is therefore a proper importance sam- 
pling scheme and the proposal kernel M n can be adapted 



to the previous z n -i (Beaumont et al 20101. We note 



that the proposed method of resampling ancestor in- 
dices is more generally known as sequential importance 
sampling with Rao-Blackwellized rejection control, and 
has been successfully applied to a variety of complex 



inference problems (Liu and Chen 1998). 
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Fig. 1 Discrepancies between several network evolution models and protein network data from T. pallidum. We show results for 
two alternative evolution models and two different observation models, (A) DD+LNK+PA-L, (B) DD+PA-L, (C) DD+PA-BP; 
results for DD+LNK+PA-BP are similar to (A) and not shown. Left column: trajectories of the CC error for four Markov chains 
that are generated in parallel. Convergence was most difficult to achieve for the model in (A) and involved re-parameterization 
because the support of some parameters spanned several orders of magnitude. Right columns: two dimensional estimates of 
the seven dimensional ABC error density, as reproduced with average shifted histograms. 
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Algorithm mhABC 

on & x X to sample from Eq.[2] 

Set initial values 9° and compute x° ~ /( • \9°). 



mhABCl If now at 6 propose a move to 9' according to 

a proposal density q(9 —>#'). 
mhABC2 Simulate x' ~ f(-\9',M) and compute e' = 

p(§(x'),§(x )). 
mhABC3 Accept (9',x') with probability 



mh(9,x;9',x') 

■ Ji 1^' 

mm < 1 , — — 



tt(0') K(e';r) 



^<9') 7r(6») «(e; r) 
and otherwise stay at (9, x). Return to mhABCl. 



Algorithm mhABCji 

on x M K to sample from Eq.[6] 

Set initial values 9° and compute = 
p k (S k (x°),S k (x )) where x°~f(-\0°). 

mhABC/^1 If now at 9 propose a move to 9' according 

to a proposal density q(9 — > 9'). 
mhABC/^2 Simulate x' ~ f('\6') an d compute e' k = 

pk (Sk(x'), S k (x )), k = 1, . . . ,K. 
mhABC 3 Accept (9',e / 1 . K ) with probability 

mh(9,e 1:K ;9' ,e[. K ) = 

n(9') n fc «(4;^) ] 

9-+ff') tt(6>) n fe «(efe;T fc ) /' 

and otherwise stay at (f,^.^). Return to 
mhABC/il. 



min < 1 



Table 2 Vanilla Metropolis-Hastings samplers for ABC and 
ABC/^. We include the summary error e in algorithm mhABC 
to emphasize that all the computations required for ABCp are 
already performed in the corresponding ABC algorithm. 



4 Application of mhABC/i to network evolution 

We previously proposed a different ABC Metropolis- 
Hastings sampler to simultaneously fit and assess the 



adequacy of a model against the data (Ratmann et al 



20091. This algorithm combined the marginal densi- 
ties ^ Xo ,d{£k) = J £,x Q M{ £ i:K)d£-k in an ad-hoc manner, 
thereby losing the true dependency structure between 
the ABC errors in Q. Marginally on error space, algo- 
rithms mhABC/z and sisABC/i sample from the joint 
distribution of ABC errors. In this section, we re-visit 
the examples on which our original ideas were devel- 
opped, and illustrate how iterative model design may 
benefit from the ability to sample from the joint error 
distribution 7r r (s 1 . K \xq). 



When large-scale, protein-protein interaction data 
became available, it came as a surprise that their topo- 
logical features deviate markedly from those expected 
under standard random graphs. A variety of simple 
mathematical models were subsequently proposed to 
explain some of these unexpected topological features 



( Stumpf et al 2007 1 . These models grow networks iter- 



atively node by node until a given finite network size is 
reached. The preferential attachment model (PA) was 
able to reproduce, roughly, the observed fat-tailed em- 
pirical distribution of node degrees (number of outgo- 
ing edges of a node). Subsequently, more biologically 
plausible models based on the duplication of nodes and 
(local) link divergence among the duplicates (DD), as 
well as (global) link rewiring were proposed (LNK). 
We analyzed mixtures of these mechanisms on protein 
network data that was obtained with high-throughput 



bait-prey technologies (Ratmann et ah 2009). Briefly, 



the DD+PA mixture model has three parameters, and 
DD+LNK+PA has five parameters. Available network 
data reflects only a subset of the true interaction net- 
work. To interface the evolution models with data, sev- 
eral observation models have been formulated. Given 
a simulated complete network, we can randomly sam- 
ple links until the observed number of links is matched 
(L). Alternatively, we can mimick the experimental de- 
sign by randomly sampling bait and prey proteins until 
the observed numbers are matched, and record asso- 
ciated links (BP); see the Appendix for details. These 
sampling schemes do not add any parameters. All prior 
densities are uninformative. 

To compare observed network data xq to the corre- 
sponding simulations x, we use seven summary statis- 
tics that reflect local and global properties of the net- 
work topology: average node degree (ND); within-reach 
distribution (WR) , the mean probability of how many 
nodes are reached from one node within distance k = 
1,2,..., where distance is the minimum number of edges 
that have to be visited to reach a node j from node 
i; diameter (DIA), the longest minimum path among 
pairs of nodes in a connected component; cluster co- 
efficient (CC), the mean probability that two neigh- 
bours of a node are themselves neighbours; fragmen- 
tation (FRAG), the percentage of nodes not in the 
largest connected component; log connectivity distri- 
bution (CONN), log (p{k u k 2 )ND 2 )/(k lP (k 1 )k 2P (k 2 )), 
the depletion or enrichment of edges ending in nodes of 
degree ki , fc 2 relative to the uncorrelated network with 
same node degree distribution p(k); box degree distri- 
bution (ODBOX), the probability distribution of boxes 
with k edges to nodes outside the box. The choice of 



these summaries is discussed in (Ratmann et al 2007). 
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We applied algorithm mhABC/i in Table [2j3 with 
annealing on the ABC thresholds and the variance of 
the Gaussian proposal density q(9 — > 9') across a set 
of network evolution models. The sampler converged 
rapidly to the target density ([6]), as assessed by four par- 
allel Markov chains that were started at overdispcrscd 
values. Markov chains were highly autocorrelated. Fig- 
ure [l] displays representative Markov chain trajectories, 
and two dimensional estimates of the seven dimensional 
ABC error density for three different models. 

Inspection of the multi-dimensional ABC error den- 
sity enabled us to diagnose specific deficiencies in mod- 
els of network evolution. In our approach to ABC, each 
error Sk corresponds directly to one summary, thereby 
enabling targeted model refinement. For example, Fig- 
ure [T]\ illustrates that the fitted link rewiring model 
with link subsampling (DD+LNK+PA-L) fails to match 
local connectivity patterns (CC) in the observed Tre- 
ponema pallidum network ( Titz et al 2008[ ) while it re- 
produces global topological patterns (ND, WR). This 
prompted us to remove the link rewiring component 
and to consider DD+PA-L, which results in some im- 
provement (Figure |Tj3) . We also replaced link subsam- 
pling with bait-prey sampling. The fitted DD+PA-BP 
model is consistent with the seven considered aspects 
of the data in that the point e x . K = is well in the 
support of the ABC error density (Figure [T}U). 



5 Application of sisABC/x to dynamic systems 

The behavior of complex dynamic systems is often de- 
scribed by sets of non-linear ordinary or stochastic dif- 
ferential equations. Typically, these systems are only 
partially observed in aggregated form, and associated 
models are not analytically solvable and may fail to 
reproduce important aspects of the sytem's dynamics. 



scribe observed disease patterns. We consider a stochas- 
tic SIRS model defined by the transmission rates 



Toni et al (20081 proposed sequential ABC algorithms 
(in Table |3j\) for parameter inference in this setting. In 
this section, we demonstrate with the following example 
that the algorithm in Table [3)3 can expose deficiencies 
of partially observed, nonlinear dynamic models. 

The disease dynamics of influenza infecting humans 
in temperate regions are characterized by explosive, sea- 
sonal epidemics during the winter months and marked, 
irregular variation across consecutive seasons. To un- 
derstand the epidemiology of seasonal influenza, vari- 
ous complex non-linear dynamic models that track the 
number of susceptible (S), infected (I) and recovered/ 
immune (R) individuals in a population have been con- 



sidered (Earn et all 2002 1. Much of this work is mo- 



^= v(N-S)-j3 t p + 'y(N-S-I) 
dl , S T , . T 



(8) 



where p is the birth/death rate, N is the finite popu- 
lation size, v is the recovery rate, 7 the rate by which 
immunity is lost and fit the transmission rate that is 
further assumed to vary seasonally as 

Pt = /3(l + ssin(27rf)). 

In practice, we account for long-term demographic trends 
(thus, N and p are fixed), and reparameterize j3, v, 
7 in terms of the basic reproductive number Rq = 
Pt/iy + /•*)> the average duration of infection per day, 
D = and the average duration of immunity per 
year, r — [3657] _1 ; see the Appendix for further de- 
tails. We fit and contrast Q to weekly Dutch influenza- 
like illness data (xq) of influenza A (H3N2) that was col- 
lected between 1968 — 99 (http://www.nivel.nl/griep/). 
Influenza-like illness data is subject to fluctuating re- 
porting biases. We use a Poisson observation model that 
accounts for unknown reporting biases p and known 
seasonal fluctuations in reporting fidelity f t (see Ap- 
pendix) , 



dx 



= pfr 1 !, 



(9) 



yielding five unknown parameters 9 — (Rq, D 7 r, s, p) 
in total. The prior densities are Rq ~ W(l,20), r ~ 
W(l,160), s ~ ^(0.075,0.6), p ~ ^(0.04,0.4) (unin- 
formative), and D ~ W(2.2,2.8) (informative) ( |Gupta 



et al 19981. We use the standard Euler multinomial 
scheme (Tian and Burrage 2004) to simulate from Jsl 



tivated by the fact that simple models that assume 
gradual loss of immunity to reinfection do not fully de- 



To compare the observed influenza-like illness counts 
xq to simulated data x, we use five summary statis- 
tics that reflect the characteristic dynamic features of 
influenza A (H3N2). Interannual seasonal variation is 
primarily reflected by the cumulative distribution and 
autocorrelation in peak differences (CDF-Z\PK, ACF- 
Z\PK). The explosiveness of seasonal epidemics is re- 
flected by the average duration of an epidemic at or 
above half its peak size (M-EXPL), and we query over- 
all magnitude with the cumulative distribution of peak 
incidence (CDF-PK) and the average annual attack rate 
(M-ATTR). Here, annual attack rates are computed di- 
rectly as the ratio of cumulative influenza-like illness 
counts in a winter season against average population 
size in that season; see the Appendix for further de- 
tails. 
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Fig. 2 Discrepancies between the SIRS model ( 8^9 I and aggregated influenza-like illness counts associated to influenza virus 
A (H3N2) from the Netherlands, 1970-99, as detected by sisABC/t. Empirical data is shown in black, and simulations in blue; 
simulations correspond to one representative, accepted particle of sisABC/i. (A) Incidence data, standardized per 100,000 indi- 
viduals. (B) Cumulative distribution function of differences between successive peaks. (C) Autocorrelations between successive 
peak differences. Simulated incidence data shows too strong and too regular inter-seasonal variation. Algorithm sisABC/i was 
seeded with 100 samples from mhABC/i that were obtained by thinning 500 iterations of 4 chains after burn-in, and then run 
for two more iterations with 1000 particles at the final ABC thresholds. (D) Histogram of the evolution of the ACF-zlPK error 
across importance sampling iterations. (E-F) Two-dimensional estimates of the five-dimensional ABC error density. While the 
fitted model reproduces the magnitude of disease incidence (M-ATTR), it does not match patterns of observed interannual 
seasonal variation (CDF-4PK, ACF-ziPK). 



The sequential algorithm sisABC/i (Table[3f3) read- 
ily detects several shortcomings of the nonlinear stochas- 
tic SIRS model, while fitting it simultaneously to the 
data. Here, we actually seeded sisABC/i with a few 
samples from algorithm mhABC/i taken after burn-in, 
rather than seeding from the prior density. As we dis- 
cuss later, this hybrid approach (hybridABC/i) often 
improves overall efficiency. Our sequential sampler was 
then run for two iterations at the final ABC thresh- 
olds with 1000 particles. We find that model (8|9) pro- 
duces disease dynamics with too regular and too strong 
interannual variation. In Figure [2j we display a rep- 



resentative particle sampled at the last ABC iteration 
and two-dimensional projections of the estimated, five- 
dimensional ABC error density Q. 

We also compared the numerical efficiency of algo- 
rithms mhABC/i, sisABC/i, as well as the hybrid ap- 
proach in terms of simulation effort per effectively in- 
dependent sample from the target density Here, 
we present results on the SIRS model (8j9|. Algorithm 
mhABC/i was run with annealing on the ABC thresh- 
old and the variance of the proposal density. Four chains 
were generated in parallel. For comparison, burn-in of 
mhABC/i is here the number of iterations to anneal 
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burn-in 


ESS/1000 


#sim/ESS 


mhABC/x 

sisABC/x 

hybridABC/i 


4639 [3963, 5178] 
24286 [21117, 26481] 
19263 [18884, 19605] 


60 [49, 75] 
125 [34, 184] 
117 [41, 200] 


859 [705, 1186] 
557 [374, 1834] 
363 [194, 968] 



Table 4 Case study to analyze the performance of algorithms mhABC/x, sisABC/t, and the sequential importance sampler 
that is seeded with mhABC/i (hybridABC/i) on the example in Section[5] The first column gives the burn-in of the algorithms 
under investigation. The second column reports effective sample size per 1000 samples from ^ after burn-in. The third column 
gives the number of total simulations, including burn-in, per effective sample. 



to the final ABC threshold, summed across the four 
chains. The effective sample size (ESS) was computed 
by the method of |Sokal| ( |1989[ ). Algorithm sisABC/it was 
run with 1000 particles for five iterations with anneal- 
ing on the ABC threshold and the variance of the pro- 
posal density. Algorithm hybridABC/i used 100 thinned 
samples from 500 iterations of four Markov chains af- 
ter burn-in to seed a sequential importance sampler 
with 1000 particles for two more iterations at the fi- 
nal ABC threshold; these configurations gave best re- 
sults in terms of the total number of simulations per 
effective sample (#sim/ESS). For both sisABC^t and 
hybridABC/i, burn-in is the number of simulations to 
reach the final iteration of the sequential importance 
sampler, which also counts simulations from mhABC/i. 
Here, ESS was calculated from the particle weights of 
the final iteration. The results in Table E] are obtained 
from 100 replications of all algorithms, and may not 
be directly comparable because the methods to com- 
pute effective samples sizes differ. Comparing overall 
simulation effort (third column) suggests to us that the 
hybrid sampler performs best as it combines rapid con- 
vergence with an efficient method to generate effectively 
independent samples from the target density (first and 
second column). 



6 Application of ABC/j to population genetics 

The ABC errors make possible to contrast a model 
against the real data in absolute terms. While this en- 
ables to successively refine any given model, there is, 
currently, no general approach to compare two mod- 
els based on these errors. We illustrate the interplay 
of model checking and model comparison on a simple 
population genetics example where the true Bayes fac- 
tor can be numerically estimated. 

Population genetics seek to infer aspects of the evo- 
lutionary history of a population of related individuals 
from genetic data. We assume that the different pop- 
ulations evolve from a common ancestral population 
and that a tree topology encodes this history. The dy- 
namics of changing genotype frequencies in populations 
that comply with the tree topology can often be rep- 
resented by the Coalescent process, and methods have 
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Fig. 3 The two evolutionary scenarios considered in Section 
[6] A coalescent process evolves under time over each branch of 
the two scenarios. Left: Model Div23, where Population 3 di- 
verged from Population 2 at time t = 30 Right: Model Divl3, 
where Population 3 diverged from Population 1 at time t = 30 



been developed to infer the associated process param- 
eters from a small sample of genetic data of these pop- 



ulations (Crow and Kimura 1970 Kingman 1982]). 

We consider here the case of three populations hav- 
ing evolved according to one of the two models de- 
scribed in Figure [3j Under both models, some individu- 
als were genotyped from each population at time t = 0. 
Populations 1 and 2 diverged at a known date (t = 60), 
and the deepest divergence event in the tree occurred 
at t = 30. Each sample from one of the three pop- 
ulations is composed of 15 diploid individuals, which 
have been genotyped at 5 independant microsatcllitc 
loci (different genotypes differ in the size of the re- 
peated region). Evolution of these loci over time fol- 
lows the above model, i.e., when a mutation occurs, the 
length of the sequence is increased or decreased by the 



length of the short repeated motif ( Ohta and Kimura 



19731. The per capita mutation rate was set to 0.005 



for all five loci. We only infer the effective population 
size N e , assuming homogeneity across branches of the 
different scenarios. The prior density of N e was set to 
£Y(2,150). This setup is quite simple in order to allow 
computation of the true posterior density with impor- 
tance sampling techniques (combining techniques based 
on 



Stephens and Donnelly 2000 ) 



To perform ABC analysis, the data are summarized 
through 24 statistics. Some of these numerical sum- 
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data set 


Dl 
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numerically estimated posterior probability of Div23 
ABC approximation to posterior probability of Div23 


0.81 
0.88 


0.30 
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Fig. 4 The interplay of model assessment via ABC errors and model comparison via the Bayes factor on the phylogenetic 
simulation study in Section [fj] (A) Estimates of the posterior probability of model Div23, computed with importance sampling 
and ABC, under equal prior probabilities for each model. Comparing both models relative to each other, Div23 is favored 
under data set Dl and Divl3 under D2. (B) Estimates of some two-dimensional ABC error density for both models and both 
data sets. Comparing each models in absolute terms against the data, only model Divl3 matches data set D2 reasonably well. 



maries are sensitive to the distance on the tree topol- 
ogy of two out of the three population samples taken at 
t = 0. We consider Wright's measure of population het- 



are computed as ratios of frequencies of acceptances of 



erogeneity between two populations (FST) (Weir and 



Cockerham 19841, the negative log-likelihood (LIK) 



that samples from one population actually come from 



another population (Rannala and Mountain 1997), as 



well as the shared allele distance between two popu- 



lations (DAS) ( jChakraborty and Jin||1993| ). We write 
FST. a. 6 when FST is computed between samples corre- 
sponding to populations a and b, and likewise for DAS 
and LIK. This gives all in all six summary statistics. 
While FST and LIK are positively correlated with ge- 
netic divergence, DAS is negatively correlated. 

To compare different methods for ABC model choice 
in a controlled setting, we simulated two data sets un- 
der model Div23 with N e — 75 and applied algorithm 
rejABC/x; see the Appendix. Both models are similar 
enough so that the evidence for model Divl3 can be 
larger than the evidence for Div23. Figure |4|\ reports 
importance sampling estimates of the true Bayes fac- 
tor as well as ABC approximations thereof. The for- 
mer are computed by importance sampling approxi- 
mations to both marginal likelihoods, while the later 



simulations from both models, see Robert et al (2011) 



for details. Robert et al (2011) demonstrated that the 



ABC approximation of the Bayes factor does not con- 
verge to the (numerically estimated) Bayes factor with 
increasing sample size and/or increasing computation 
runtimes. Considering the first data set (Dl), the ev- 
idence for model Div23 is larger when compared to 
the evidence for Divl3. For the second data set (D2), 
the situation is reversed, reflecting chance events with 
which the simulated data sets were generated. The ABC 
errors in Figure |3j3 reveal that, only, Divl3 matches 
data set D2 reasonably well. Model Divl3 shows larger 
discrepancies on this data set, which is in agreement 
with the Bayes factor. Turning to data set Dl, the ABC 
errors show that none of the fitted models are consistent 
with the data, although the errors are slightly smaller 
under model Div23, again in agreement with the Bayes 
factor. 



7 Discussion 

We showed how existing ABC algorithms can be mod- 
ified to aid targeted model refinement on the fly, and 
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illustrated these algorithms on examples from evolu- 
tionary biology and dynamic systems. Underlying this 
"workhorse machinery is a simple re-interpretation of 
ABC as a particular data augmentation method on er- 
ror space, and the presented algorithms are an imme- 
diate consequence of this rcintcrprctation. Previously, 
we recognized the utility of the ABC errors as (un- 
known but estimable) compound random variables that 
may reflect both stochastic fluctuations and system- 
atic discrepancies between a model and the data ( |Rat 



tic in that they change in different directions as 9 is 
changed. In case of model mismatch, we thus expect 
an irreconcilable conflict between components of £% : k 



(Robert et al 2009 Ratmann et al 2010). This multi- 



mann et al 2009 1 . There are two elementary points to 



make. The joint distribution of ABC errors faith 
fully reflects dependencies among different aspects of 
the full, intractable data. Indeed, the product ABC ker- 
nel that is used here does not confound the relation be- 
tween multiple, lower-dimensional projections ^ on er- 
ror space. Thus, each dimension of the ABC error den- 
sity retains an intrinsic meaning and can be used to di- 
agnose specific model deficiencies. Second, the ABC er- 
rors are already computed by existing ABC algorithms 
(see Tables l][3) and there is no further computational 
cost associated for the purpose of model assessment. 

The ABC error density ([7]) provides an exploratory / s k ~K T [e^ .^\xo)de 1 .^ 
rather decisional tool for targeted, iterative model re- J 
finement. One obstacle in using ^ more formally for 
the purpose of model comparison is that the ABC errors 
have an intrinsic scale that is model-dependent, and 
are often not directly comparable. However, as there is 
no direct connection or convergence of approximations 
to the Bayes factor to the true Bayes factor between 



dimensional perspective leverages the power of the ABC 
error density in uncovering existing discrepancies con- 
siderably. More precisely, it follows from the results in 
Section [2] that the expected ABC error vector 

E tA £ 1-.k\ X 0) = ( J £k^T(£l:K\ X 0)d£l:K) 1 R 

is in each dimension weighted according to the "mutual 
constraints" ]J k n(E k ; r k ) : 

J £ k n T (e UK \x )de 1 . K 

p k (S k (x), Sk(xo)) Y\_K\Pj(Sj(x), Sj(x ));Tj)7r(dx). 



From a theoretical point, there is no possibility of con- 
flict between summary errors whenever the respective 
summaries are independent of each other, 



Pk(Sk(x), Sk{xo))K^p k (Sk(x),S k (xo));T k ^TT(dx); 



two alternative models (Robert et al 2011 ), there is re 



newed interest in harnessing the information provided 
by Q for ABC model choice. Approximations to the de- 
viance information criterion (Francois and Laval 2011 I 



see the Appendix. As before, conflict is sensitive to the 
choice of the tolerance t and vanishes as r — > oo. Cru- 
cially, conflict may emerge between as few as two co- 
dependent summary errors. We thus often do not re- 
quire a set of sufficient summaries to be able to detect 
existing model discrepancies. Note, however, that ef- 



forts to orthogonalize a set of summaries (Nunes and 



provide an overall measure of goodness of fit that com- 
plements the approach taken here. Clearly, both meth- 
ods are often sensitive to the choice of the ABC thresh- 
olds t in the same way as ABC parameter inference is 
sensitive to r, and caution is warranted. Further work 
on the selection of those thresholds is clearly needed, 



Balding 2010 1 may diminish chances to uncover model 



(2011). 



along the lines drafted by Blum ( 2009 ) and Marin et al 



In general, the ABC error density ([7| does not nec- 
essarily uncover existing model deficiencies. It may be 
that the level of information contained in the data is 
too low to eliminate an inadequate model. An ideal 
setting is when both summaries and ABC thresholds 
can be chosen adequately in order to expose those defi- 
ciencies, but this often is an unachievable goal, if only 
for computational reasons. At this stage, we can only 
make the following simple recommendations. For com- 
plex models in many real- world applications, the errors 
e VK are typically dependent and possibly antagonis- 



deficiency. 

Based on our experience with the algorithms pre- 
sented in Tables [2][3j we further recommend to combine 
mhABC/x and sisABC/i to sample from pj. Intuitively, 
mhABC/i updates only a single particle in relation to 
its previous value, and may lead to rapid convergence 
to the target density. Techniques aiding rapid conver- 
gence are discussed in Ratmann et al (2007). As the 



Metropolis-Hastings sampler might become stuck |Sis- 
son et al ( 2007 1 , we run multiple chains in parallel un- 



til shortly after burn-in. Our acceptance probabilities 
of mhABC/i are typically below 5%, hence the gener- 
ated Markov chains are highly autocorrelated. In our 
case study presented in Table |4j we find that sisABC/i, 
seeded with samples from mhABC/i, may quickly re- 
plenish effective sample sizes because it also proposes 
ancestor indices. We present a small case study in Ta- 
ble |4j There are perhaps two main caveats with this 
hybrid approach. First, all Markov chains generated by 
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mhABC/i might get stuck. However, this is a generic 
feature with all MCMC implementations that can be 
generically attenuated by the annealing nature of ABC^t 
(via the choice of the T nh 's). Second, the acceptance 
probabilities in the rejection control step of sisABC/i 
can be significantly lower than those of mhABC/j, be- 
cause the latter are not computed in relation to previous 
error magnitudes. 



8 Conclusion 

We presented methods and algorithms for exploratory 
model assessment when the likelihood is computation- 
ally intractable. To us, the advantages of this approach 
towards ABC model choice are that sufficient summaries 
are often not required to detect existing discrepancies 
between a model and the data, and that the algorithms 
incur no or little extra computational cost as compared 
to standard ABC methods. Perhaps, the main short- 
comings of this approach may be the scale dependency 
of the ABC errors and their sensitivity to the ABC 
threshold. We believe that, in addition to investigat- 
ing the effects of the ABC approximation to standard 
Bayesian machinery such as the Bayes factor or the de- 
viance information criterion, exploiting the particular 
properties of that approximation may provide comple- 
mentary, useful tools for ABC model choice. 

A Proofs of sections [2] and [7] 

We first derive the marginal density of |6| in 9. By construc- 
tion, it is 

tt abc (%o) oc7r(0) J K(e UK ;r k ) (, Xo>e {de llK ) 

= tt(0) J K^p k (S k (x),S k (x ));r k ^ ^ f(dx\8). 

Consequently, the marginal target densities tt t (8\xo) of algo- 
rithms rejABC and rejABC/x coincide if 

k(p(§,(x),$(x );t) oc Yl K {pk(S k (x),S k (x );T k ) 

k 

(We conjecture this property only happens for the normal 
kernel n and the Euclidean distance p, as well as for the indi- 
cator kernel and the Manhattan distance.) To establish J7J, 
consider the following error measure on the (same) associated 
Borel image cr-algebra under the ABC projection 

tt X0 {E 1 x...xE K )= J Tr{B)C Xo ,e{E 1 x ...xE K )d8 
= II ?r(9) 1 {* e X ■ " X Ek) } /(£fa|e) 



dO 



ifeC( £ i x ■•• x e k)} 
J x 1 { xe Co 1 ^ x ■ • • x ^ 



tt(0) f{dx\8) d6 



Here, ir(dx) is the prior predictive distribution of the data 
( |Box| [1980 I, and TT Xo (de 1 . K ) can thus be interpreted as the 
prior predictive if-dimensional error measure under the ABC 
projection. We assume that 7T Xo admits a density (also de- 



noted by) tt Xo : 



the same measure as QSp , and define the algorithm outcome 
as 

7T t{s 1:K \x ) oc ]^[/«(e fe ;T fc ) 7r Xo (e UK ). 

k 

We previously derived the same relations under a Riemann in- 
terpretation of the integrals involved ( jRatmann et al| |2009| l . 
The Lebesgue approach presented here is much less convo- 
luted. 

Finally, we establish the intuitive result that conflict can- 
not emerge between independent summary errors. If the sum- 
mary errors in ABC are independent of each other, 

¥ e , X0 (E 1 ,...,E K ) = J t{xe^(E 1 x...xE K )}f(dx\8) 

factorizes componentwise. Likewise, the prior predictive K- 
dimensional error density Tr Xo (e 1 . K ) admits the factorization 

"so^iix) = Hk=i ^k,x a {sk)- Since the ABC kernel is here 
assumed to factorize as well, we obtain 

TT(e 1:K |i(i) 

= 11 TTk,x a {e k )^(e k ;T k ) / 
fc=l ' 

■ ■ Yl "■fc.fo^iiK ) K ( £ fe; r fe) ds! ... de K 
•* k=l 

k . K 

= Y\ 7r fe,2:o( e fe) K ( e fe; T fe) / II / 7I "fc,^ ( £ fc) K ( £ fe' T fc) de k 

fc=l ' k=l J 

K 

= Yl K k ,T k (e h \xo). 
fc=l 

Therefore, the fcth component of the vector-valued mean ABC 
error E,r (e^^l^o) collapses to 

K 

Sk Y[ 7T k, Tk (s k \x ) de 1:K 
fe=l 

= j £k ^k,T k (sk\x ) J " J n ""i.tj^jI 350 ) de - k 

je-k 

= [ £k ^k,T k {s k \x )de k 

Pk(Sk{x),S k {x )) Tre k (pk{Sk(x),S k (x ));Tk) n(dx). 



s 1 . K — > tt Xo (e 1 . K ) with respect to 



B Simulation from the network models 

Each of the considered network evolution models defines a 
discrete-state discrete time Markov chain of growing networks. 
For network data of some organism, we grow a network ac- 
cording to model-specific transition probabilities until the 
number of nodes in the network equals the number of genes in 
the genome of that organism. Transition probabilites are im- 
plicitly defined by the following probabilistic rules. The pref- 
erential attachment (PA) model adds a new node to an exist- 
ing node with probability that is proportional to the node de- 
gree of the existing node. In the duplication-divergence model 
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(DD), a parent node is randomly chosen and its edges are 
duplicated. For each parental edge, the parental and dupli- 
cated one are then lost with probability ^Div each, but not 
both; moreover, at least one link is retained to any node. 
The parent node may be attached to its child with proba- 
bility 5a- The mixture model DD+PA either performs PA 
with probability a, or DD with probability I — a. Model 
DD+LNK+PA is a mixture of PA, DD with S A = 0, link 
addition and deletion. Link addition (deletion) proceeds by 
choosing a node randomly, and attaching it preferentially to 
another node (deleting it preferentially from its interaction 
partners). Unnormalized mixture weights are calculated as 
follows. For duplication-divergence, the rate Ad up is multi- 
plied by the order of the current network; for link addition, 
the rate AAdd is multiplied by ( r 2 er ) — Size; for link dele- 
tion, the unnormalized weight of link addition is multiplied by 
Anei ■ Preferential attachment occurs at a constant frequency 
a, and the weights of duplication, link addition and link dele- 
tion are normalized so that their sum equals 1 — a. Each of the 
components is chosen according to the normalized weights. 

We consider here two alternative observation models to 
account for missing data. Denote the number of proteins in 
the observed data set xq with n(xo), the number of bait pro- 
teins with ribait^o) < n(xo), the number of prey proteins 
with n pre y(rr.o) < ti(xq), the number of links with m(xo) and 
the fully simulated network with x. Link subsampling (L): 
Set x to be empty and repeat until the number of links in 
x is equal or larger than m(xo), or no more links can be 
added: pick a link (it, v) at random and without replacement 
from x and add the nodes u, v and the link (it, v) to x. Bait- 
prey subsampling (BP): Create two random lists of bait and 
prey proteins. Set x to be empty and repeat until the number 
of baits nb a it(2;) equals ribait^o) and the number of preys 
n-p Tey {x) is equal or larger than n prey (:ro), or no more links 
can be added: pick a link (u, v) at random and without re- 
placement from x such that it is in the bait list and v is in 
the prey list, mark u as a bait and v as a prey protein and 
add the nodes it, u and the link (it, v) to x. 



C Simulation from the stochastic SIRS model 



The Euler-Maryuama algorithm is incremented by 0.5 days. 

The observation model used here accounts for seasonal 
differences in the true positive rate ft of reported influenza- 
like illness cases that are subsequently confirmed as true in- 
fluenza cases with virological analyses. Effectively, our model 
inflates simulated summer incidence to larger values, because 
typically less than 5% of reported influenza-like illness cases 
are confirmed during the summer period. We do not have 
access to the known true positive rate in the Netherlands. In- 
specting available incidence data and true positive rates from 
the U.S. between 1997-2008 ( [http://www.cdc.gov/nu/weeklyl 
/fluactivitysurv.htm), we find it is possible to predict ft in 
one season reasonably accurately from the timing of peak in- 
cidence in the same season. 



D Summaries for the stochastic SIRS model 

We investigated a much larger number of candidate sum- 
maries, and found that the set (CDF-ZiPK, ACF-Z\PK,M- 
EXPL,CDF-PK,M-ATTR) is sufficient to expose model de- 
ficiencies against the observed data and to estimate model 
parameters accurately in simulation studies. We used the 
following distance functions for each of the summaries. For 
CDF-ZiPK, we compute the Cramer-von-Mises test statistic. 
For ACF-ZiPK, we compute the log ratio of the observed 
and simulated autocorrelation at lag 2 to reproduce influenza 
A (H3N2)'s weak biennial oscillation. For M-EXPL and M- 
ATTR, we compute the log ratio of the observed and simu- 
lated means. The summary CDF-PK is subject to consider- 
able volatility upon re-simulation, which precluded the use of 
the the Cramer-von-Mises test statistic. Most of the param- 
eter space results in strongly bimodal peak distributions. To 
identify the small parameter space for which more gradual 
peak distributions are obtained, we found it most efficient to 
query the discrepancies between the two cumulative distribu- 
tion functions at peak size 200 and 400 in terms of the log 
ratio. Here, we use log ratios and tail area probabilites rather 
than difference functions so that the resulting ABC errors 
have some intrinsic interpretability. 



Assuming that the infinitesimal probability of either simulta- 
neous or multiple transitions between compartments is neg- 
ligible, the transmission rates |8]|9| define a finite-state con- 
tinuous time Markov chain that accounts for demographic 
stochasticity. To adjust for long-term demographic trends in 
the Netherlands, we fix N to historical population data ob- 
tained from (http://statline.cbs.nl/statweb/), and set l//i = 
80. To avoid stochastic extinction, we add a small, constant 
number of infected visitors I v which can be interpreted as 
the average number of infected travelers that are visiting the 
Netherlands at any day. The first equation in |s| is thus 

— = n(N-S)-p t -(I + Iv)+j(N-S-I). 
dt N 

The value is set to the expected number of infected trav- 
elers for a given model parameterization. More precisely, we 
estimate the average total number of international travelers in 
the Netherlands at 5m/yr from available tourist information 
(http://statline.cbs.nl/statweb/) and set I v the proportion of 
international travelers that would be infected at any day at 
endemic equilibrium, 



/" + 7 



E Simulation from the population genetic 
models 



M + 7- 



(1 - 1/Ro) x 5 x 10 6 /365. 



We used the DIYABC software (see |Cornuet et~a3| |2008[ ) to 
produce simulations from model Divl3 and Div23. The geno- 
types of the simulated samples are drawn independently for 
each locus. Following the population history of the model and 
assuming no natural selection, the gene genealogy (which can 
be displayed as a dendrogram rooted at the most recent com- 
mon ancestor) is given by time-continuous Coalescent pro- 
cess. Coalescence rate in the genealogy is governed by the 
effective population size N e : in a population, if k ancestors of 
the sample remain at a given time, a coalescent event (joining 
two branches of the dendrogram chosen at random) occurs af- 
ter an exponential time with rate k(k — 1)/ (2N e ). Hence the 
number of branches over (backward) time in a population 
evolves according to a pure death Markov process, jumping 
from state k to state k — 1 with rate k(k — l)/(2iV e ). Condi- 
tionally on the genealogy of this locus, mutation rate (/i) over 
each branch of the dendogram is assumed to be 0.005. We can 
then genotype the simulated sample, starting from the geno- 
type of the most recent common ancestor and respecting the 
mutation model described in Section [6] 
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We sum up the whole simulated data set with the 24 usual 
summary statistics given in |Robert et al| < |2011 1. In the simple 
situation considered here, it is possible to apply the rejection 
algorithm, and in fact, we here re-evaluated the same pseudo 
data sets that were also used in (jRobert et aT| [201 1 ) . The 
acceptance function k, see Table [1| is a product of 24 one- 
dimensional indicator functions centered around 0. And the 
tolerances Tfc are tuned so that we keep in Model Div23 about 
1000 simulated data sets from the 10 simulated data sets 
of the reference table. Our conclusions on model assessment 
rely the error distribution of six summary statistics, which 
are correlated with the distances in genetic variation between 
populations. 
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